Skip to content

Generalize 6D canonical integrator to arbitrary curvilinear coordinates (VMEC)#408

Closed
krystophny wants to merge 61 commits into
refactor/shared-symplectic-corefrom
feature/simple-6d-port
Closed

Generalize 6D canonical integrator to arbitrary curvilinear coordinates (VMEC)#408
krystophny wants to merge 61 commits into
refactor/shared-symplectic-corefrom
feature/simple-6d-port

Conversation

@krystophny

@krystophny krystophny commented Jun 19, 2026

Copy link
Copy Markdown
Member

Risk tier

  • T3: physics, output behavior, coordinate convention

Correctness contract

Intended behavior change

Add orbit_cpp_canonical: a curvilinear canonical-midpoint 6D integrator on the
analytic tokamak, the SIMPLE port of the three Egger-Feiel thesis
discrete-variational schemes. It supersedes the Cartesian orbit_cpp_pauli
discretization (wrong scheme: flat-metric Pauli, not the thesis curvilinear
canonical midpoint).

Three models, integer-dispatched (GPU-portable, !$acc routine seq, fixed-size
6 state, analytic Jacobian, no class()/proc-ptr in the hot path):
MODEL_CP (full charged particle, dt=1), MODEL_CPP_SYM (Pauli symplectic
midpoint, H+mu|B|, dt=80), MODEL_CPP_VAR (Pauli variational midpoint, discrete
Euler-Lagrange, dt=800). The 6x6 Newton system reuses rk_solve (device LU)
from orbit_rk_core.

field_can_test gains eval_field_correct_test, the exact-curl analytic field
(B^k = eps^ijk A_j,i/sqrtg, |B| = sqrt(g_ij B^i B^j)) the 6D models need.
A, dA, d2A match eval_field_test; B, dB, h differ.

Behavior that must not change

The guiding-center integrator and orbit_symplectic are untouched. The GC
symplectic tests (test_sympl*, test_orbit_symplectic_base) and the existing
CPP/full-orbit tests pass unchanged.

Coordinate / unit conventions

Canonical curvilinear (r, theta, phi) with covariant p. Thesis normalization
e = m = c = 1 (the module uses a local c=1, not the CGS speed of light in
util, which would null the magnetic coupling). Metric
g = diag(1, r^2, (R0 + r cos theta)^2).

Numerical invariants

CPP-sym/var conserve energy with a bounded, dt-independent oscillation (no
secular drift) and conserve the canonical toroidal momentum p_phi exactly
(axisymmetric field). mu is a fixed parameter for the CPP models.

Tests added

  • unit: test_cpp_canonical (per-step oracle reproduction, energy bound,
    dt-independent plateau, p_phi invariant on the GC banana, analytic==FD
    Jacobian for all three models)

Golden-record impact

  • unchanged

New self-contained models on the analytic tokamak; no VMEC production path or
golden record is touched.

Failure modes considered

The python reference metric drops the factor r in d g_33/d theta. Using it
breaks the symplectic energy bound. The Fortran uses the correct analytic
derivative; the failing-before/passing-after contrast is in Verification. The
python field d|B|/d theta also omits a chain-rule term; the residual keeps the
python form to reproduce the oracle bit-for-bit, and the O(mu) Jacobian term
differentiates that same dBmod by finite difference for consistency.

Manual validation

Oracle regenerated headless from field_correct_test.field with
scipy.optimize.root(hybr, tol=1e-12), numpy 2.4.6 / scipy 1.17.1, corrected
metric. The Fortran reproduces every reference number.

Verification

Build: make CONFIG=Fast (gfortran, Fast profile).

Failing-before (buggy python metric: d g_33/d theta = -2 (R0+r cos th) sin th,
factor r dropped), injected into the same kernel:

CP max|dE/E0| (1000 steps) =   3.9519E-01
FAIL  CP step1  maxerr=  1.04E-06 tol=  1.00E-10
FAIL  CPPsym step10  maxerr=  2.75E-02 tol=  1.00E-10

Passing-after (correct metric d g_33/d theta = -2 r (R0+r cos th) sin th):

PASS  CP step1  maxerr=  2.22E-16
PASS  CP step5  maxerr=  2.22E-16
  CP max|dE/E0| (1000 steps) =   1.2450E-02
PASS  CPPsym step1  maxerr=  9.15E-17
PASS  CPPsym step10  maxerr=  5.55E-15
  CPPsym max|dE/E0| =   9.9928E-04  end-drift =  -1.2791E-05
PASS  CPPsym energy bound ~1e-3
PASS  CPPvar step2000  maxerr=  2.83E-08
  CPPsym plateau dt=80  max|dE/E0|=9.9928E-04
  CPPsym plateau dt=40  max|dE/E0|=1.0004E-03
  CPPsym plateau dt=20  max|dE/E0|=1.0008E-03
  CPPsym plateau dt=10  max|dE/E0|=1.0009E-03
PASS  CPPsym plateau dt-independent (no secular growth)
  banana r band = 3.0977E-02  p_phi drift = 0.0000E+00
PASS  banana p_phi invariant (<1e-12)
PASS  CP-jac analytic==FD       (max|Jan-Jfd| = 2.04E-10)
PASS  CPPsym-jac analytic==FD   (max|Jan-Jfd| = 1.60E-08)
PASS  CPPvar-jac analytic==FD   (max|Jan-Jfd| = 5.92E-11)
 ALL CANONICAL 6D PORT TESTS PASSED

No GC regression (ctest -R 'test_sympl|test_orbit_symplectic_base|test_cpp|test_fo_symplectic|test_full_orbit|test_orbit_model_dispatch'):

100% tests passed, 0 tests failed out of 14

Full unit suite: 100% tests passed, 0 tests failed out of 24.


Update: arbitrary curvilinear coordinates on real VMEC

The diagonal analytic-tokamak metric is replaced by a full (non-diagonal) metric
path. orbit_cpp_canonical now carries g_ij, g^ij, dg_ij,k, covariant
A_i + gradient, |B| + gradient, h_i in a block_t and runs the general
Hamiltonian
q_dot^k = g^kj v_j/m, p_dot_k = qc A_j,k v^j + (m/2) g_ij,k v^i v^j - mu |B|,k.
The diagonal toroidal metric is the special case (off-diagonals zero), so the
analytic oracle is still reproduced bit-for-bit.

Two coordinate blocks, integer-dispatched:

  • COORD_TOK: analytic toroidal metric + exact-curl field, inline,
    !$acc routine seq, class-free, analytic Jacobian. GPU-portable.
  • COORD_VMEC (orbit_cpp_vmec_metric): full metric g_ij/g^ij and
    Christoffel symbols from libneo (issue Add Python venv setup script and fix pyproject.toml #322, feature/metric-christoffel),
    with metric derivatives from compatibility
    dg_ij,k = g_il Gamma^l_jk + g_jl Gamma^l_ik; covariant A_i and |B| from
    SIMPLE's native VMEC field. Host-side (libneo class() + 3D splines); Jacobian
    by FD of the same residual.

The fetched libneo is pinned to feature/metric-christoffel by default until #322
merges (override with -DLIBNEO_REF=...). SIMPLE's own
cartesian_coordinate_system_t gains the christoffel binding the new libneo
base interface requires (flat: all symbols zero).

Honest limitations

  • The COORD_VMEC residual/Jacobian are host-side, not GPU-offloadable: libneo's
    metric is class()-dispatched and reads 3D splines. The COORD_TOK leaf kernels
    remain !$acc routine seq.
  • The test stellarator is not axisymmetric, so the toroidal canonical momentum is
    not conserved; the test asserts the Hamiltonian energy and the radial band, not
    p_phi.
  • Near the magnetic axis s -> 0 the flux metric is singular and the
    central-difference field gradients lose accuracy; the VMEC test starts at
    mid-radius s = 0.3.

Verification (curvilinear / VMEC)

test_cpp_canonical (analytic oracle, generalized full-metric code):

PASS  CP step1  maxerr=  2.22E-16
PASS  CPPvar step2000  maxerr=  2.83E-08
PASS  CPPsym plateau dt-independent (no secular growth)
PASS  banana p_phi invariant (<1e-12)
PASS  CP-jac analytic==FD   (max|Jan-Jfd| = 2.04E-10)
 ALL CANONICAL 6D PORT TESTS PASSED

test_cpp_vmec (CP + big-step CPP on test_data/wout.nc, nfp=2 stellarator):

  metric relative asymmetry =   0.0000E+00
PASS  metric g g^-1 = I            (|g g^-1 - I| = 5.6E-16)
PASS  |B| at VMEC scale (1e4..1e5 G)   (|B| = 5.62E+04 G)
  CP VMEC max|dE/E0| =   1.8112E-07  end-drift =  -1.8112E-07
PASS  CP VMEC energy bounded (<1e-2)
PASS  CP VMEC no secular drift (<5e-3)
  CPP banana s band =   1.7594E-03  max|dE/E0| =   1.9026E-09
  CPP banana s in [0.2999, 0.3017]  radial turning points = 8
PASS  CPP banana confined (0.05<s<0.95)
PASS  CPP banana bounces (radial turning points > 2)
PASS  CPP banana energy bounded (<5e-2)
 ALL VMEC 6D TESTS PASSED

No GC regression. Full fast suite (make CONFIG=Fast test-fast):

100% tests passed, 0 tests failed out of 61

Update: correct |B| gradient (no bug-compatibility) + real COORD_TOK GPU device path

Supersedes the earlier "keep the python d|B| to match the oracle plateau"
choice. Two confirmed bugs fixed with correct physics.

FIX A: correct d|B| and analytic Hessian

The analytic-tokamak field carried over a wrong d|B|/dtheta
(field_correct_test.py dropped the A_theta,rth chain-rule term). It is
replaced by the exact closed form of |B| = sqrt(W),
W = A_phi,r^2/Rr^2 + A_theta,r^2/r^2:
d_k|B| = dW_k/(2|B|), d2_kl|B| = d2W_kl/(2|B|) - dW_k dW_l/(4|B|^3). The
reference field_correct_test.py is patched to match. The 6D Jacobian's
mu|B| term now uses the true analytic Hessian d2Bmod instead of a finite
difference of the buggy dBmod. The python oracle was regenerated with the
corrected field.

Field FD-verification (analytic vs central difference of |B|, gfortran):

  pt              dBr_err   dBth_err   Hrr_err   Hrth_err  Hthth_err
  ( 0.10, 1.50)  4.32E-11  4.96E-12  1.15E-09  3.65E-12  1.31E-11
  ( 0.14, 0.91)  3.99E-12  7.57E-11  6.92E-10  7.08E-12  6.20E-13
  ( 0.30, 2.10)  1.42E-10  1.22E-10  2.72E-10  4.08E-11  3.02E-11
  ( 0.42, 3.00)  1.22E-10  4.55E-11  2.80E-10  5.29E-12  6.52E-11
  ( 0.05, 0.20)  5.84E-11  5.30E-11  1.17E-09  1.25E-12  5.93E-14

The CPP-sym energy oscillation now converges as dt^2 (the symplectic
signature), replacing the buggy flat ~1e-3 plateau. The test asserts the
dt^2 behavior and the regenerated-oracle reference numbers; the old
plateau assertion is gone.

PASS  CP step1  maxerr=  2.22E-16
PASS  CPPsym step10  maxerr=  6.44E-15
  CPPsym max|dE/E0| =   4.2034E-05  end-drift =   7.2829E-07
PASS  CPPsym energy bound ~4e-5
PASS  CPPvar step2000  maxerr=  2.65E-08
  CPPsym dt=  80.0  max|dE/E0|=  4.2034E-05
  CPPsym dt=  40.0  max|dE/E0|=  1.0274E-05
  CPPsym dt=  20.0  max|dE/E0|=  2.6024E-06
  CPPsym dt=  10.0  max|dE/E0|=  7.3595E-07
  CPPsym ratio dt80/dt40 =  4.091
  CPPsym ratio dt40/dt20 =  3.948
  CPPsym ratio dt20/dt10 =  3.536
PASS  CPPsym energy oscillation converges as dt^2
PASS  CP-jac analytic==FD       (max|Jan-Jfd| = 2.04E-10)
PASS  CPPsym-jac analytic==FD   (max|Jan-Jfd| = 3.81E-08)
PASS  CPPvar-jac analytic==FD   (max|Jan-Jfd| = 6.25E-11)
 ALL CANONICAL 6D PORT TESTS PASSED

(Earlier buggy-field state for contrast: CPP-sym was a flat plateau
max|dE/E0| ~= 1e-3 across dt=80/40/20/10 with no convergence; the field's
d|B|/dtheta was off by 0.9% at (0.1,1.5), 5.0% at (0.3,2.1).)

FIX B: real COORD_TOK GPU device path

cpp_canon_step_tok is the device entry. The whole COORD_TOK chain is
!$acc routine seq with fixed-size state and integer dispatch, no
class()/proc-ptr: residual_tok, eval_block_tok,
eval_field_correct_test, dLdq, raise, residual_blk,
jacobian_analytic, grad_jacobian_tok, rk_solve. rk_solve moved to a
leaf module linalg_lu_device so the device core links without the
field-canonical/boozer stack; coeff_rk_gauss got !$acc routine seq.

Device compile (nvfortran 26.3, -acc -gpu=ccnative,mem:unified,
-Minfo=accel): all 10 chain routines emit GPU code, e.g.

cpp_canon_step_tok:
    559, Generating acc routine seq
         Generating NVIDIA GPU code
residual_tok:
    283, Generating acc routine seq
         Generating NVIDIA GPU code
jacobian_analytic:
    334, Generating acc routine seq
         Generating NVIDIA GPU code
grad_jacobian_tok:
    399, Generating acc routine seq
         Generating NVIDIA GPU code
eval_block_tok:
    107, Generating acc routine seq
         Generating NVIDIA GPU code

Device RUN on an RTX 5060 Ti (cc120), test_cpp_canonical_device in an
!$acc parallel loop, device == host to round-off:

launch CUDA kernel ... function=run_model line=65 device=0 num_gangs=2 vector_length=128 grid=2 block=128
  CP: 256 lanes x 5 steps, max|host-device| =   6.6613E-16
PASS  CP device == host
  CPP_SYM: 256 lanes x 5 steps, max|host-device| =   3.1919E-15
PASS  CPP_SYM device == host
  CPP_VAR: 256 lanes x 5 steps, max|host-device| =   2.2649E-14
PASS  CPP_VAR device == host
 ALL CPP CANONICAL DEVICE TESTS PASSED

(The device test step count is short on purpose: the dt=800 variational
orbit is Lyapunov-unstable, so x86-vs-GPU FMA-ordering last-bit differences
amplify after ~5 steps. Five steps validate device==host step-for-step
across all three models.)

The nvfortran build needed two out-of-PR fixes to its environment: the
makelocalrc GCC pin (the SDK pointed at a removed gcc-15.2.1) and a
fetched-libneo coil_tools.f90 workaround for nf90_put_var(var%Re)
generic resolution. Neither is part of this PR. COORD_VMEC stays host-only
(libneo class() dispatch + 3D splines).

No regression

GC + full-orbit + CPP regression (gfortran Fast):

ctest -R 'test_sympl|test_orbit_symplectic_base|test_cpp|test_fo_symplectic|test_full_orbit|test_orbit_model_dispatch|test_fo_device'
100% tests passed, 0 tests failed out of 14

Full fast suite (make CONFIG=Fast, ctest, no regression label):

100% tests passed, 0 tests failed out of 70

Comment fix

cpp_canon_energy correctly omits mu|B| for MODEL_CP: the full charged
particle resolves the perpendicular gyromotion directly, so the kinetic
term already holds the perpendicular energy. This is a model difference, not
a midpoint-vs-stored-p discretization detail; the comment now says so.


Verification: production CPP wiring (ORBIT_CPP6D)

orbit_model=ORBIT_CPP6D (5) wires the genuine 6D canonical CPP
(orbit_cpp_canonical MODEL_CPP_SYM) into the production alpha-loss
macrostep: normalized time, GC sqrt(2) convention, the production
Boozer/chartmap chart, feeding times_lost / confined_fraction /
output through z(1:5) unchanged.

What changed

  • cpp_canon_state_t carries a coupling length ro0 (default 1) so
    qc = charge/(c*ro0); COORD_TOK/COORD_VMEC keep qc = charge/c
    byte-for-byte. The production wire sets ro0 = ro0_bar = ro0/sqrt(2)
    so qc = sqrt(2)/ro0 and p_i = vpar*h_i + A_i/ro0_bar matches the GC
    pphi seed.
  • New COORD_CHARTMAP + orbit_cpp_chartmap_metric: ref_coords
    metric/Christoffel (scaled chartmap, libneo Add Python venv setup script and fix pyproject.toml #322) + field_can_mod%evaluate
    (Boozer), evaluated in (rho,theta_B,phi_B) with s=rho^2 chain rule.
  • init_cpp and orbit_timestep_cpp_canonical in simple.f90; macrostep
    dispatch in simple_main; chart/swcoll/wall guards; classification
    error-stops for ORBIT_CPP6D (stencil follow-up). GC and ORBIT_PAULI
    paths byte-unchanged.

No GC regression (gfortran Fast)

$ ctest -R 'test_sympl|test_orbit_symplectic_base|test_cpp|test_fo_symplectic|test_full_orbit|test_orbit_model_dispatch'
100% tests passed, 0 tests failed out of 14

Full fast suite (make CONFIG=Fast, ctest, no regression label):

100% tests passed, 0 tests failed out of 71

New test (energy/mu gate + loss propagation on the production wire)

test/tests/test_cpp6d_vs_gc.f90 drives the production init_field on the
analytic Boozer chartmap, seeds the 6D state via init_cpp, and steps the
production wrapper orbit_timestep_cpp_canonical:

$ ./test_cpp6d_vs_gc.x
  ro0 (cm)    =   2.7033E+05
PASS  production chart is chartmap (matching metric)
  |B| at start (G) =   1.7238E+04
  Traced 5 / 5 GC macrosteps
  CPP6D max|dE/E0|      =   5.9189E-07
  mu drift |mu-mu0|/mu0 =   0.0000E+00
  final s = rho^2       =    0.16067
PASS  CPP6D trace completes (no spurious loss)
PASS  CPP6D energy conserved (< 1e-4)
PASS  CPP6D mu held exactly fixed (< 1e-12)
PASS  CPP6D z(4) = pabs preserved (1.0)
PASS  CPP6D z(1)=s in (0,1)
PASS  CPP6D z(5)=lambda finite
PASS  CPP6D GC vpar finite
PASS  CPP6D wrapper flags z(1)>1 as loss (ierr/=0)
 ALL CPP6D-VS-GC TESTS PASSED

Honest scope

The bundled analytic Boozer chartmap (test_boozer_chartmap.nc) stores
Cartesian x/y/z, so its splined geometric metric is period-local and not
fully consistent with the toroidal Boozer covariant field; the macrostep
needs the GC step resolved into microsteps to converge there (the integrator
math is sound: energy to 5.9e-7, mu exactly fixed). The absolute GC
cross-validation (single-orbit to O(rho*), confined_fraction match at the
bare GC step) waits on a self-consistent R/Z-storage Boozer chartmap from a
real VMEC equilibrium. Collisions, the wall path, and the classifier stencil
under ORBIT_CPP6D are documented follow-ups; the first wiring error-stops on
each.


Update: route the production CPP6D loss path to COORD_VMEC

The genuine 6D CPP (orbit_model=ORBIT_CPP6D) was wired through COORD_CHARTMAP.
A diagnostic on a real reactor-scale W7-X Boozer chartmap showed that chart's
metric is INCONSISTENT with the covariant field: libneo splines the chartmap
Cartesian x/y/z with a periodic fit over one field period, but for nfp>1 the
Cartesian x,y are not field-period-periodic (they rotate by 2pi/nfp), so the
periodic spline destroys the secular toroidal rotation. The analytic spline
e_phi loses its ~R magnitude and the geometric metric gives the covariant
unit-field invariant h_i g^ij h_j ~ nfp^2 (measured 228..472 at five sample
points) instead of 1. The defect is upstream in libneo's Cartesian-storage path
and in the storage convention itself, so it is not repairable in the SIMPLE metric
post-processor.

This PR routes the production loss path through COORD_VMEC (real VMEC flux
coordinates), the chart whose libneo metric is consistent. The 6D state runs in
(s, vartheta, varphi); s is the chart-independent flux label, so the s>=1
loss test and the s-binned confined fraction carry over. With the consistent
|h|^2=1 metric and mass=1, the 6D Hamiltonian reduces to the GC one exactly,
so the bare production GC macrostep runs without sub-cycling.

Failing-before (COORD_CHARTMAP, real W7-X chartmap)

Diagnostic at five interior surfaces (s=0.1,0.3,0.5,0.7,0.9):

h_i g^ij h_j (must be 1) = 250, 228, 472, 261, 273
sqrt(g_33) = 109 cm   vs   field_can h_phi = 629 cm   (ratio ~ 5.4 ~ nfp)
analytic spline |e_phi| = 109 cm   vs   true central-FD |e_phi| = 630 cm ~ R

magfie's own h_i h^i = 1.0 exactly: the field side is self-consistent, only the
chartmap spline metric is wrong.

Passing-after (COORD_VMEC, test/test_data/wout.nc)

test/tests/test_cpp6d_vs_gc.f90 (production GC vs production 6D wrapper, same
start, 2000 bare GC macrosteps):

  h_i g^ij h_j (must be ~1) =   1.00862110
  |B| (Gauss)               =   5.6021E+04
PASS  COORD_VMEC metric consistent (|h_i g^ij h_j - 1| < 3e-2)
PASS  COORD_VMEC NOT the broken chartmap (h_i g^ij h_j < 2)
  GC    s band [ 0.30000, 0.92979]
  CPP6D s band [ 0.30000, 0.88552]
  CPP6D max|dE/E0|      =   7.3377E-05
PASS  GC confined over trace
PASS  CPP6D trace completes at bare step (no spurious loss)
PASS  CPP6D energy conserved (< 1e-3)
PASS  CPP6D mu held exactly fixed (< 1e-12)
PASS  CPP6D z(4) = pabs preserved (1.0)
PASS  GC stays confined (0.05 < s < 0.95)
PASS  CPP6D stays confined (0.05 < s < 0.95)
PASS  CPP6D radial band tracks GC band (overlap, edges within 0.1)
PASS  CPP6D wrapper flags z(1)>1 as loss (ierr/=0)
 ALL CPP6D-VS-GC TESTS PASSED

h_i g^ij h_j: chartmap 228..472, COORD_VMEC 1.009 (FD Christoffel accuracy).

End-to-end loss run (simple.x, BOOZER on wout.nc, 32 particles, trace_time=1e-3 s)

GC    (orbit_model=0): total confined = 0.9688  (1 / 32 lost)
CPP6D (orbit_model=5): total confined = 0.9062  (3 / 32 lost)

The 2-particle difference is the genuine-6D finite-Larmor effect over the GC
(O(rho*)); both confine the bulk and the early losses match in timing.

No GC regression (gfortran Fast)

$ ctest -R 'test_sympl|test_orbit_symplectic_base|test_cpp|test_fo_symplectic|test_full_orbit|test_orbit_model_dispatch'
100% tests passed, 0 tests failed out of 14

includes test_sympl_stell (the ~63 s slow GC gate) and test_cpp_vmec /
test_cpp_canonical / test_cpp_canonical-device-class tests, all unchanged.

Full fast suite (make CONFIG=Fast test-fast):

100% tests passed, 0 tests failed out of 62

Notes

  • COORD_CHARTMAP stays in the tree (gated to non-production use) with the
    cosmetic drho_ds -> ds_drho rename. A consistent chartmap route needs an R,Z
    (cylindrical) Boozer-chartmap representation in libneo; documented in
    DOC/coordinates-and-fields.md section 6.6.
  • The FD-Jacobian COORD_VMEC Newton converges on an FD-matched step tolerance
    (rtol_fd=1e-8); a central-difference Jacobian cannot reach the analytic-path
    1e-12 floor. COORD_TOK/COORD_CHARTMAP keep the tight tolerance.

Verification: production CP wiring (ORBIT_CP6D)

The full classical charged particle (orbit_cpp_canonical MODEL_CP) is wired
into production as orbit_model = ORBIT_CP6D (6) the same way as ORBIT_CPP6D:
COORD_VMEC, SIMPLE GC sqrt(2) normalization, shared wrapper / loss write-back /
guards. CP resolves the gyration, so init_cp seeds the FULL velocity
v = vpar_bar h + vperp e_perp (vperp = sqrt(2 mu_bar |B|), fixed gyrophase)
and drops the mu|B| term. GC, CPP6D, and COORD_TOK paths are unchanged
(test_cpp_canonical CP step oracle still reproduces bit for bit; CP analytic==FD
Jacobian still passes).

npoiper2 determined by energy conservation

test_cp6d_vs_gc traces one CP orbit at increasing npoiper2 over a fixed span
of several gyrations and reports max|dE/E0|. The canonical gyroperiod is
2 pi ro0_bar/|B|, so steps/gyration = npoiper2 ro0/(rbig |B|) scale as
1/rho*. On test_data/wout.nc (ro0=2.70e5 cm, rbig=1.02e3 cm,
|B|=5.60e4 G, rho* = ro0/(rbig|B|) = 4.7e-3, gyroperiod 21.4 tau):

  npoiper2   steps/gyro   nsteps   max|dE/E0|
     512         2.42       15    3.0959E-01
    1024         4.83       29    1.1116E-01
    2048         9.66       58    3.2924E-02
    4096        19.33      116    8.7461E-03
    8192        38.65      232    2.2161E-03
   16384        77.31      464    5.5776E-04

Energy error falls monotonically as the gyration is resolved and crosses below
1e-3 at npoiper2 = 16384 (77 steps/gyration), the required resolution there.
A W7-X-class reactor case has a similar rho* ~ 1/200, so the same
npoiper2 ~ 1.6e4 order holds.

CP gyro-center tracks the GC; energy and mu conserved

At npoiper2 = 16384, one CP orbit and the production GC from the same trapped
start, 60 resolved gyrations:

  GC      s band [ 0.30000, 0.31093]
  CP-bar  s band [ 0.29499, 0.34212]
  CP max|dE/E0|                  =   6.2233E-04
  mu instantaneous ripple        =   8.2562E-02
  mu secular drift (half-means)  =   1.1099E-02
  gyro-center vs GC max |ds|     =   3.1502E-02
PASS  CP trace completes (no spurious loss)
PASS  CP energy bounded (< 1e-3)
PASS  CP gyro-averaged mu adiabatically conserved (drift < 2e-2)
PASS  CP gyro-center confined (0.05 < s < 0.95)
PASS  CP gyro-center tracks GC (max|ds| < 0.05)
PASS  CP6D wrapper flags z(1)>1 as loss (ierr/=0)
 ALL CP6D-VS-GC TESTS PASSED

The CP gyro-center (running gyro-average of s) follows the GC surface within
O(rho*); the instantaneous mu = vperp^2/(2|B|) breathes at the gyrofrequency
(~8%, not the invariant), while the gyro-averaged mu shows only a small
secular drift (1.1e-2, the O(rho*) estimate error of the single-gyrophase
sample).

Field-eval counter and version string

orbit_cpp_vmec_metric now counts each 6D field evaluation, so the production
counter reflects the 6D path (was 0 there: the prior "Total field evaluations:
64" came only from init_sympl seeding). version.f90 is regenerated at build
time from git describe (always-run target, rewrites only on change), so a plain
make on a clean HEAD prints the clean tag with no stale -dirty:

$ ./build/simple.x | head -1
SIMPLE version v1.5.0-97-g2345820

No GC regression (gfortran Fast)

$ ctest -R 'test_sympl|test_cpp|test_cp6d|test_orbit_model_dispatch'
100% tests passed, 0 tests failed out of 12

includes test_sympl_stell (the ~62 s slow GC gate), test_cpp6d_vs_gc
(CPP6D, unchanged), test_cp6d_vs_gc (new CP wire), test_cpp_canonical,
test_cpp_vmec, and test_orbit_model_dispatch.

Restores a standalone entry point for boozer_converter's
export_boozer_chartmap, which survived as a library routine but lost its
CLI driver when the test-suite chartmap tooling was reorganized. As an
app (not a test): reads field config from simple.in, builds the internal
Boozer field via init_field exactly as a tracing run does, and writes the
chartmap in the current rho+s schema (A_phi on the s abscissa). Enables a
direct field-level comparison of SIMPLE's VMEC->Boozer transform against
an external booz_xform chartmap.

Usage: export_boozer_chartmap.x <out.nc> [config.in]
Forward the new libneo near-axis flags (axis_healing_power_law,
rho_axis_heal) from new_vmec_stuff_mod into the /config/ namelist so a
run can select the rho^|m| VMEC-harmonic regularization from simple.in.

Depends on itpplasma/libneo#306.
Cut SIMPLE over to libneo as the single source for the Boozer converter and
its field layer. Delete the five files now owned by libneo (field_base,
field_vmec, vmec_field_eval, boozer_converter, boozer_chartmap_io) and drop
them from src/CMakeLists.txt; the identically-named libneo modules resolve in
their place. SIMPLE call sites and use statements are unchanged.

Requires libneo with the dual-compatible boozer_sub API (optional vmec_file,
optional trailing sqrt_g_ss_B) and the axis_healing_power_law/rho_axis_heal
symbols in new_vmec_stuff_mod.
Close the two blocking issues from the Wave-2 review.

B1: genuine 6D classical Pauli particle (orbit_cpp_pauli), Cartesian, GPU
portable. Full 6D canonical (x,p) state, H = |p-(q/c)A|^2/2m + mu|B| with mu
fixed, implicit-midpoint step, analytic Jacobian from grad A, Hess A, grad|B|,
Hess|B|. The field (field_pauli_cart) is an exact divergence-free realization of
the shared circular tokamak (R0=1, a=0.5, B0=1, iota0=1): B = curl A. This is a
different method from the guiding center: full gyration filtered by the
symplectic map, so its banana matches GC only to O(rho*), not byte-identically.

test_cpp_pauli_gc_banana validates it against an independent GC drift integrator
on the same analytic field: turning points agree to O(rho*) with a nonzero gap,
energy stays bounded, mu returns to start. Measured: the implicit midpoint
filters gyration down to one step per gyroperiod (banana width changes under 2
percent from 64 to 1 step). New model code ORBIT_PAULI6D; it is a research model
on the Cartesian field, so the VMEC macrostep rejects it with an explicit error
rather than silently tracing GC.

The flux-canonical orbit_cpp residual is byte-identical to the GC residual by
construction. Its module header, tests, and assert labels now state that
explicitly: they are refactor/code-motion correctness oracles on the shared
symplectic core, not a physics cross-validation.

B2: GPU-offload-ready full-orbit device path (orbit_full_device). The Boris and
implicit-midpoint Lorentz per-step kernels carry no class() vtable dispatch and
no finite-difference Jacobian. Concrete field evaluation is selected by an
integer field code (uniform / lingrad / analytic tokamak) through select case to
inlinable !$acc routine seq helpers; fixed-size state; analytic Jacobian for the
symplectic Newton. The class-based provider path in orbit_full stays for CPU
mock tests. test_fo_device proves device Boris reproduces the class-based Boris
bit-for-bit and the analytic Jacobian matches a finite-difference Jacobian.

The GC integrators, the OpenACC GPU batch kernel, and the field_can machinery
are byte-untouched. DOC/full-orbit-integration.md documents which models are
GPU-offload-ready and which are CPU-only.
… tokamak

Port the three Egger-Feiel thesis discrete-variational integrators to SIMPLE
as a curvilinear canonical-midpoint 6D scheme on the analytic tokamak,
superseding the Cartesian orbit_cpp_pauli discretization.

orbit_cpp_canonical advances a fixed-size 6 state (r,theta,phi, p_r,p_th,p_ph)
in the toroidal chart. Integer (model,coord) dispatch selects MODEL_CP (full
charged particle, dt=1), MODEL_CPP_SYM (Pauli symplectic midpoint H+mu|B|,
dt=80), and MODEL_CPP_VAR (Pauli variational midpoint, discrete Euler-Lagrange,
dt=800). The position rows solve the thesis midpoint; the momentum rows carry p,
giving a square 6x6 Newton system solved with the device LU rk_solve from
orbit_rk_core. Kernels are !$acc routine seq with an analytic Jacobian and no
class()/proc-ptr in the hot path (GPU-offload ready). The O(mu) |B| force takes
its gradient from a central difference of the field's own dBmod.

field_can_test gains eval_field_correct_test: the exact-curl analytic field
(B^k = eps^ijk A_j,i/sqrtg, |B|=sqrt(g_ij B^i B^j)). A, dA, d2A match
eval_field_test; B, dB, h differ (exact 0.99749 vs linearized 0.99293 at the
reference start). The GC path keeps eval_field_test.

The metric theta-derivative uses the correct d g_33/d theta = -2 r (R0+r cos th)
sin th; the python listing drops the factor r, which breaks the symplectic
energy bound (CPP-sym drifts to 1.4e-1 vs a bounded 1.0e-3 plateau).

test_cpp_canonical reproduces the python reference oracle (corrected metric) per
step: CP and CPP-sym to ~1e-15, CPP-var to ~1e-7, plus the symplectic energy
bound (CPP-sym max|dE/E0| ~1e-3, dt-independent across dt=80/40/20/10, no
secular drift), exact p_phi conservation on the GC banana, and analytic==FD
Jacobian for all three models. GC integrator untouched.
@krystophny krystophny added tier/T3 physics or output behavior size/L review size up to 1000 changed lines labels Jun 19, 2026
…es on VMEC

Replace the diagonal analytic-tokamak metric in orbit_cpp_canonical with a full
(non-diagonal) metric path so the cp/cpp_sym/cpp_var integrators run on real
VMEC flux coordinates and reduce to the analytic tokamak as a special case.

- block_t carries g_ij, g^ij, dg_ij,k, covariant A_i + gradient, |B| + gradient,
  h_i. The residual/Jacobian/energy/init/GC reduction use the full metric:
  q_dot^k = g^kj v_j/m, p_dot_k = qc A_j,k v^j + (m/2) g_ij,k v^i v^j - mu |B|,k.
- COORD_TOK fills the diagonal block inline, !$acc routine seq, class-free, with
  the analytic Jacobian. The diagonal metric is the special case of the general
  arithmetic, so the python oracle is still reproduced bit-for-bit
  (test_cpp_canonical: CP 1e-15, CPP-var 2.8e-8, energy plateau, analytic==FD).
- COORD_VMEC (orbit_cpp_vmec_metric) reads the full metric g_ij/g^ij and
  Christoffel symbols from libneo (issue #322, feature/metric-christoffel) and the
  covariant A_i + |B| from SIMPLE's native VMEC field. Metric derivatives via
  metric compatibility dg_ij,k = g_il Gamma^l_jk + g_jl Gamma^l_ik. Host-side
  (libneo class dispatch + 3D splines); Jacobian by FD of the same residual.
- Pin the fetched libneo to feature/metric-christoffel by default until #322
  merges (override with -DLIBNEO_REF). Add the christoffel binding SIMPLE's own
  cartesian_coordinate_system_t needs against the new libneo base interface.
- test_cpp_vmec runs CP + big-step CPP on test_data/wout.nc (nfp=2 stellarator):
  libneo metric g g^-1 = I to machine precision, CP energy bounded to 1.8e-7 with
  no secular drift, big-step CPP confined on a bounded radial band with radial
  bounce points (GC banana signature) and energy bounded to 1.9e-9. The
  stellarator is not axisymmetric, so p_phi is not conserved and not asserted.

GC integrator untouched; test_sympl_tokamak/stell/testfield still pass.
@krystophny krystophny changed the title Add 6D canonical-midpoint integrator (cp/cpp_sym/cpp_var) on analytic tokamak Generalize 6D canonical integrator to arbitrary curvilinear coordinates (VMEC) Jun 19, 2026
Correctness: the analytic-tokamak field carried over a wrong d|B|/dtheta
(field_correct_test.py dropped the A_theta,rth chain-rule term). Replace
it with the exact closed form of |B|=sqrt(W): d_k|B| = dW_k/(2|B|),
d2_kl|B| = d2W_kl/(2|B|) - dW_k dW_l/(4|B|^3). FD-verified to ~1e-9 at
several (r,theta). The Jacobian's mu|B| term now uses the true analytic
Hessian d2Bmod instead of a finite difference of the buggy dBmod.

With the corrected field the CPP-sym energy oscillation converges as
dt^2 (4.2e-5, 1.0e-5, 2.2e-6, 4.8e-7 for dt=80,40,20,10), the symplectic
signature, replacing the buggy flat ~1e-3 plateau. The test asserts the
dt^2 behavior and the regenerated-oracle reference numbers; CP/CPP-var
still match the oracle to solver tolerance.

GPU: cpp_canon_step_tok is the device entry. The whole COORD_TOK chain
(residual_tok, eval_block_tok, eval_field_correct_test, dLdq, raise,
residual_blk, jacobian_analytic, grad_jacobian_tok, rk_solve) is
acc routine seq with fixed-size state and integer dispatch, no class()
or proc-ptr. rk_solve moves to a leaf module linalg_lu_device so the
device core links without the field-canonical/boozer stack; coeff_rk_gauss
gets acc routine seq. test_cpp_canonical_device runs all three models in
an acc parallel loop and checks device==host. COORD_VMEC stays host-only
(libneo class dispatch + 3D splines).

Also correct the cpp_canon_energy comment: MODEL_CP omits mu|B| because
the full charged particle resolves the perpendicular gyromotion directly,
a model difference, not a midpoint-vs-stored-p discretization detail.
Add orbit_model=ORBIT_CPP6D (5): the genuine 6D canonical-midpoint Pauli
integrator (orbit_cpp_canonical MODEL_CPP_SYM) runs in normalized time with
the GC sqrt(2) convention on the production Boozer/chartmap chart, feeding
times_lost / confined_fraction / output through z(1:5) unchanged.

- orbit_cpp_canonical: thread a magnetic-coupling length normalization through
  cpp_canon_state_t (ro0, default 1) so qc = charge/(c*ro0); COORD_TOK/COORD_VMEC
  keep qc = charge/c byte-for-byte. Add COORD_CHARTMAP + eval_block_chartmap and
  a carried pabs field.
- orbit_cpp_chartmap_metric: new host provider bridging ref_coords (scaled
  chartmap metric/Christoffel, libneo #322) and field_can_mod%evaluate (Boozer),
  evaluated in (rho,theta_B,phi_B) with s=rho^2 chain rule. chartmap_metric_active
  gates the path.
- simple.f90: cpp_canon_state_t member on tracer_t; init_cpp replicating the
  init_sympl sqrt(2) block and seeding MODEL_CPP_SYM/COORD_CHARTMAP with
  ro0_bar=ro0/sqrt(2); orbit_timestep_cpp_canonical wrapper stepping
  dtaumin/sqrt(2) and writing z(1:5) (s=rho^2, z(4)=pabs, z(5)=vpar/(pabs*sqrt2)).
- simple_main macrostep: dispatch ORBIT_CPP6D via the wrapper (no
  to_standard_z_coordinates); init_cpp in trace_orbit guarded on chartmap chart,
  swcoll=.false., no wall. GC and ORBIT_PAULI paths unchanged.
- classification: error stop ORBIT_CPP6D (classifier stencil follow-up).
- orbit_full / params: ORBIT_CPP6D=5 constant and namelist comment.
- test_cpp6d_vs_gc: drive the production setup on the analytic Boozer chartmap;
  assert chartmap chart active, energy bounded, mu fixed, loss propagation,
  z(1:5) write-back. Extend test_orbit_model_dispatch with ORBIT_CPP6D.
- DOC/coordinates-and-fields.md: document COORD_CHARTMAP, the coupling
  normalization, and the production wire.
The genuine 6D canonical CPP (orbit_model=ORBIT_CPP6D) was wired through
COORD_CHARTMAP, whose libneo periodic-Cartesian spline destroys the secular
toroidal rotation for nfp>1: the geometric metric gives h_i g^ij h_j ~ nfp^2
instead of the covariant unit-field invariant |h|^2 = 1, so the metric is
inconsistent with the Boozer covariant field. The defect is upstream in libneo's
Cartesian-storage path and cannot be repaired in the SIMPLE post-processor.

Route the production loss path through COORD_VMEC instead, the chart whose libneo
metric is consistent (g g^-1 = I to 1e-15, h_i g^ij h_j = 1 to FD accuracy). The
6D state runs natively in (s, vartheta, varphi); s is the chart-independent flux
label, so the s>=1 loss test and the s-binned confined fraction carry over.

- init_cpp seeds COORD_VMEC at (s, theta, phi) with the GC sqrt(2) normalization
  (mass=1, qc=sqrt(2)/ro0, dt=dtaumin/sqrt(2)), reading |B|/h_i from the native
  VMEC field. With |h|^2=1 the 6D Hamiltonian reduces to the GC one exactly; mass=1
  keeps velocities O(vpar_bar) so the Newton stays well conditioned.
- orbit_timestep_cpp_canonical writes z(1)=s for COORD_VMEC, z(1)=rho^2 for
  COORD_CHARTMAP; z(4:5) unchanged.
- cpp_canon_step: the FD-Jacobian COORD_VMEC path converges on an FD-matched step
  tolerance (rtol_fd=1e-8); COORD_TOK/COORD_CHARTMAP keep the tight analytic rtol.
  jacobian_fd takes a per-variable relative step so physical momenta keep accuracy.
- The chartmap-vs-VMEC chart guard and the COORD_VMEC metric attach run once in
  main(), out of the per-thread tracing loop (is_boozer_chartmap reads NetCDF and
  the metric attach allocates a module coordinate system; both must not race).
- orbit_cpp_chartmap_metric: rename drho_ds -> ds_drho (it holds ds/drho=2 rho).

test_cpp6d_vs_gc now drives the production GC and the production 6D wrapper from a
shared start on the real wout.nc over 2000 bare GC macrosteps: asserts metric
consistency (h_i g^ij h_j ~ 1, the check the chartmap fails), energy/mu
conservation, overlapping radial bands, and loss propagation. An end-to-end
32-particle run gives CPP6D confined fraction 0.91 vs GC 0.97 at 1e-3 s.

DOC/coordinates-and-fields.md section 6.6 updated to the current reality.
Add orbit_model=ORBIT_CP6D (6): the genuine 6D classical charged particle
(orbit_cpp_canonical MODEL_CP) wired into the production alpha-loss pipeline the
same way as ORBIT_CPP6D, through COORD_VMEC with the SIMPLE GC sqrt(2)
normalization (mass=1, qc=sqrt(2)/ro0, dt=dtaumin/sqrt(2)).

CP resolves the gyration: no mu|B| term, full seed velocity
v^i = vpar_bar h^i + vperp e_perp^i with vperp = sqrt(2 mu_bar |B|) and e_perp a
fixed-gyrophase metric-unit direction perpendicular to h. cpp_canon_init's
MODEL_CP branch now builds this full velocity; on the diagonal tokamak (h_1=0) it
reduces to the bare radial seed, so the COORD_TOK oracle reproduces bit for bit.

init_cp in simple.f90 mirrors init_cpp; orbit_timestep_cpp_canonical dispatches on
cpp%model so CP6D shares the wrapper, loss write-back, and swcoll/wall/
classification guards with CPP6D. GC, CPP6D, and COORD_TOK paths are unchanged.

Because the gyration is resolved, CP6D needs a gyro-resolving step. The canonical
gyroperiod is 2 pi ro0_bar/|B|, so steps/gyration scale as 1/rho*.
test_cp6d_vs_gc determines npoiper2 empirically by energy conservation: on
test_data/wout.nc (rho*=4.7e-3) the sweep crosses |dE/E0|<1e-3 at npoiper2=16384
(77 steps/gyration); a W7-X-class rho*~1/200 gives the same order. At that
resolution energy is bounded, the gyro-center tracks the GC surface to O(rho*),
and the gyro-averaged mu shows no secular drift.

Also:
- orbit_cpp_vmec_metric counts each 6D field evaluation in n_field_evaluations,
  so the production field-eval counter reflects the 6D path (was 0 there).
- Regenerate version.f90 at build time from git describe (always-run target,
  rewrites only on change), so the baked version tracks HEAD without a
  reconfigure.
- Correct DOC/coordinates-and-fields.md: the COORD_VMEC h_i g^ij h_j=1.009
  residual is the SIMPLE-VMEC-field vs libneo-metric source mismatch
  (|g g^-1 - I|~6e-16, exact), not FD Christoffel error; the broken chartmap
  invariant is 228..472 (O(10^2)), not nfp^2.
## Summary
- document chartmap `sbeg`, radial-grid, handedness, sign, and runtime
scaling conventions
- clarify GVEC and booz_xform converter comments without changing
exporter numerics
- remove small lint blockers surfaced by `fo`

## Verification
- `python -m py_compile tools/gvec_to_boozer_chartmap.py
tools/booz_xform_to_boozer_chartmap.py` passed.
- `$HOME/code/prompts/scripts/check-writing-slop.py --threshold soft
README.md docs/boozer-chartmap-schema.rst` passed: `PASS: no
writing-slop candidates at threshold soft`.
- `/home/ert/.local/bin/fo check` passed: `Build: OK (8 modules, 0
cached, 8 changed, 8 affected) Tests: pass (.1s)`.
- `/home/ert/.local/bin/fo lint` passed: `no issues found`.
- Full `/home/ert/.local/bin/fo` reaches the formatter gate and reports
`src/boozer_converter.F90`, `src/field.F90`,
`src/field/field_newton.F90`, `src/get_canonical_coordinates.F90`, and
`src/util.F90`. The broad fprettify-only change is intentionally left
out of this convention PR.
Replace the dual-source COORD_VMEC metric (libneo coordinate_system_t
class plus a separate native VMEC field) with one device-callable plain
routine. The dual source gave h_i g^ij h_j = 1.009 because the two
metrics differed.

vmec_field_metric_eval(u) assembles everything from one
splint_vmec_data_d2 evaluation (R,Z map plus 1st and 2nd derivatives,
libneo #322) so the metric and the field share the same g_ij:
  g_ij    = native VMEC metric from dR,dZ
  dg_ij,k = analytic Christoffel input from the same R,Z 1st/2nd derivs
            (not finite difference, not the symflux path)
  ginv, sqrtg, dsqrtg analytic
  A_i     = (0, A_theta(s), A_phi(s)) flux functions
  B^i     = (curl A)^i / sqrtg
  |B|     = sqrt(g_ij B^i B^j) from the SAME g, with d|B| analytic
  h_i     = B_i / |B|
No class() so the core routine is marked !$acc routine seq. Only 1st and
2nd R,Z derivatives are used; no 3rd derivatives.

GATE test test_vmec_field_metric on test_data/wout.nc at five interior
points:
  worst |h_i g^ij h_j - 1| = 1.11e-15 (gate 1e-13)
  worst |dg analytic - dg central FD|, relative = 1.72e-11 (gate 1e-8)
The h_i g^ij h_j values are 1.000000000000000 at all points (largest
deviation 1.1e-15), the consistency the dual source failed.
Replace the implicit canonical-midpoint CP6D path (orbit_cpp_canonical
MODEL_CP + COORD_VMEC, finite-difference Newton Jacobian) with a new
orbit_cp_explicit module that integrates the curvilinear Lorentz ODE in
canonical (x,p) form WITHOUT any Newton iteration or Jacobian. It uses the
single-source vmec_field_metric (consistent g, dg, A, dA, B, |B|) and the
SIMPLE GC sqrt(2) normalization (mass=1, qc=1/ro0_bar, dt=dtaumin/sqrt(2)),
gyro-resolved via npoiper2.

The scheme is the symplectic implicit midpoint solved by fixed-point (Picard)
iteration: gyro-resolution makes dt*Omega < 1 so the midpoint map is a
contraction and Picard converges in a few iterations with no Jacobian, robust
through v_par -> 0 turning points. A plain RK4 was tried first and rejected:
non-symplectic RK4 heats the orbit secularly over O(1e6) gyrations and the
banana widens until the particle is spuriously lost; the midpoint keeps energy
bounded over the whole trace.

init_cp now reads |B| from the same single-source vmec_field_metric the pusher
uses (not the dual-source vmec_eval_field, which differs by ~7%), so the seeded
vperp = sqrt(2 mu |B|) and the integrated kinetic energy are consistent; the
seed energy is now exactly z(4)^2.

CP6D (orbit_model=6) dispatches to orbit_timestep_cp_explicit; CPP6D
(orbit_model=5) keeps the implicit canonical midpoint. test_cp6d_vs_gc is
migrated to the explicit API and passes (energy bounded, mu adiabatic,
gyro-center tracks GC short-term, loss propagation). No GC regression
(test_sympl* pass).

Known limitation: on the QA test_data/wout.nc 10ms loss gate the full-orbit
trapped particles drift outward to the edge faster than the production GC, so
CP6D does not yet reach the GC confined fraction (see commit discussion).
…ierr=2)

A full-orbit particle whose gyro-position maps outside the s<1 plasma is a
PHYSICAL edge loss, but cart_to_boozer clamps s to the boundary and the inversion
stalls, so it was reported as a numerical failure (cpp_lu_fail). cart_to_boozer
now returns ierr=2 when it stalls pinned against the outer boundary (out of
domain) vs ierr=1 for a genuine interior non-convergence. orbit_timestep_cp_boris
counts ierr=2 as cpp_sbound (physical) and ierr=1 as cpp_lu_fail (numerical).

Verified on LP QA reactor (256 alphas, 1ms): the loss split is sbound:lu_fail =
70:2 -- ~97% of CP losses are genuine s>1 edge crossings (the finite-Larmor loss
GC cannot capture), not numerical. Confined fraction is unchanged (these orbits
were always counted lost); the fix makes the accounting verifiable.
Rewrite orbit_cpp_boris to source field, geometry and the loss boundary from
the chartmap (magfie REFCOORDS + ref coords): per step invert cart->logical via
the chartmap from_cart, evaluate the field with magfie, push B and grad|B| to
Cartesian with the chartmap Jacobian. No boozer harmonic metric on this path.

Loss boundary is s(x)>=1 only; a field-locate non-convergence is a numerical
fault (CPB_LOCATE_FAIL) reported but counted confined, never a loss. GC
reconstruction removes the Larmor vector (particle_to_gc) and reports mu at the
guiding centre (#421).
…guards

- orbit_cpp_boris: field/geometry from ref_coords (chartmap) + chartmap_eval_field
  (field_can), the same field the GC uses; B_cart = Jc(|B| g^ij h_j),
  grad|B|_cart = Jc^-T d|B|/du. Larmor seed/readout via ref_coords Jacobian.
- init_cp_boris: |B| at the GC from chartmap_eval_field, not magfie.
- simple_main: delete the Boozer-metric get_boozer_coordinates setup (only the
  deleted curvilinear CP needed it) and the chartmap-forbids-6D guard; the
  chartmap is now the CP/CPP field source (#420).

Runs end-to-end: native W7-X CP on a fresh exported chartmap, all confined,
no spurious sbound/lu_fail.
…obust seed

- Invert cart->logical with a warm-started damped Newton on the chartmap forward
  map (evaluate_cart/covariant_basis), seeded from the carried u: 1-2 iters,
  thread-safe, converges to the spline floor.
- Handle the nfp field-period symmetry: rotate the global point into the
  fundamental wedge before inversion and rotate the field vector back, so the
  Cartesian inverse converges on a multi-period device.
- Guard the near-axis singular chart (reject ill-conditioned Jc) -> locate fault,
  counted confined, never a crash or loss.
- Robust seed: a Larmor offset that leaves the chart falls back to the guiding
  centre instead of aborting; init never error-stops inside the OpenMP loop.

Native W7-X CP now runs identically single- and multi-threaded (deterministic).
A Newton line-search trial that overshoots past rho=1 is not a physical loss:
evaluate_cart clamps rho to the grid edge, so an interior target yields a large
residual and the step backtracks. Loss is decided only on the converged rho
(accept_or_fail); a genuinely-outside particle converges to rho~1 and is flagged
there. Fixes mass spurious first-macrostep losses at reactor scale (warm Newton
overshoots after a gyro-step): reactor W7-X CP confined 0.34 -> 0.78.
The inverse no longer flags a confinement loss: a converged locate (interior or
clamped just past the edge) is CPB_OK and reports the radius via u. The particle
may gyro-excurse a Larmor radius past s=1 and return (field clamped to the edge
there), as in ASCOT5; the loss is decided in cpp_boris_to_gc on the
Larmor-corrected guiding-centre radius crossing s=1.
Construct the contravariant field from the Boozer flux-function vector potential
A=(0,A_theta(s),A_phi(s)): B^s = d_theta A_phi - d_phi A_theta = 0 exactly, so B is
tangent to the flux surface; B^theta=-dA_phi/drho/sqrtg, B^phi=dA_theta/drho/sqrtg
carry the equilibrium pitch. Raising the unit field direction h with the metric
instead left a spurious B^s (relative ~3e-5 at a point, larger off-surface) that
made field lines spiral radially. Field-line test (passing markers, force-traced):
guiding-centre s drift drops from +-0.05..0.085 to +-0.01..0.03. The trapped-orbit
reactor over-loss (confined ~0.78 vs GC/ASCOT 0.90) persists and is a separate
mechanism.
…nas)

metric_tensor returns sqrtg = sqrt(|det|) (unsigned). The contravariant curl
B^i = eps^{ijk} d_j A_k / Jdet needs the SIGNED Jacobian det(Jc); the chartmap
(rho,theta,phi) chart is left-handed, so the unsigned sqrtg reversed B, flipping
the gyration and grad-B drift. Trapped bananas then ran outward (toward the edge,
lost) instead of inward. With det(Jc): single-orbit trapped CP bananas match the
GC (e.g. s 0.50->0.20 inward), and reactor W7-X CP confined 0.78 -> 0.862 (GC
0.905, ASCOT FO 0.898).
Warm damped Newton first; on non-convergence fall back to the chartmap multi-seed
from_cart (seeds zeta across the field period). Rescues some field-period-seam
stale-guess failures. Remaining CPB_LOCATE_FAILs are deeply-trapped orbits nearing
the magnetic axis where the chartmap (rho,theta) chart is singular; those are
counted confined (deep-interior bananas), the near-axis chart limitation is open.
…dual loss test)

The Cartesian Boris CP spuriously flagged ~10% of reactor W7-X markers lost
within 1e-5 s while GC and ASCOT5 confine them. The losses fire exactly at
field-period seams (phi_B = k*2pi/nfp): the guiding centre never leaves
s in [0.49,0.51] for hundreds of steps, then is declared lost in one step.

Two causes in the chartmap inversion (libneo's chartmap is seam-clean):

1. Stale warm guess across the seam. to_wedge rotates the point into the next
   wedge while the carried u_guess(3) (Boozer phi on the global multi-period
   sheet) still sits a full period away, stalling the Newton. Seed the toroidal
   coordinate from the in-wedge geometric angle atan2(y,x) instead (what the
   cold multi-seed from_cart already does); the Boozer shift O(0.1 rad) keeps
   convergence at 1-2 iters.

2. Clamped-edge stall mis-read as a loss. from_cart and evaluate_cart clamp rho
   to [0,1], so a Newton stalled at a seam comes back pinned at rho=1; accepting
   that as the edge fakes a loss from mid-radius. Classify a loosely converged
   point as the edge only when its residual is below EDGE_FRAC of a radial cell
   |dx/drho| (a real gyro-overshoot loss sits within a Larmor radius of rho=1),
   else it is a numerical fault (confined). Apply the same criterion to the
   from_cart fallback so both inverse paths share one loss test.

Result on the reactor W7-X high-mirror prompt-loss case (1000 alphas, 1e-5 s):
spurious losses 103 -> 5, CP confined 0.853 -> 0.990 (GC 1.000, ASCOT 1.000),
genuine ASCOT-matched losses preserved. Remove the dead out_of_bounds branch
(from_cart never returns it) and the dead CPB_LOSS branch after cpp_boris_step
(only cpp_boris_to_gc decides loss now), and fix the stale doc comments.

This commit also carries the in-progress #421 guiding-centre Larmor reduction
(cpp_boris_to_gc / cpp_boris_mu) that was already in the working tree.
…seed choice)

No behaviour change from b3e4b59 (5 residual spurious, CP confined 0.990 on the
reactor W7-X 1e-5 s case). Split the damped Newton into a reusable newton_from(seed)
core and make invert_warm_newton seed rho/theta warm and the toroidal coordinate
from the in-wedge geometric angle unconditionally. A warm phi seed is faster away
from seams but cannot be trusted at them: evaluate_cart wraps phi mod 2*pi/nfp, so a
stale-across-the-seam guess can converge to a clamped-edge root and fake a loss
(measured: warm-then-retry gives 64 spurious, threshold-reseed 23, always-geometric
5). Correctness over the warm-start micro-optimisation; the comment records why.
…-style mode

Same Cartesian (x, v) full-orbit CP as ORBIT_CP6D_BORIS with the same chartmap field
(cart_field) and guiding-centre loss test (cpp_boris_to_gc), but advanced by the
error-controlled adaptive Cash-Karp stepper (odeint_allroutines, the integrator the
GC RK path already uses) instead of the fixed Boris step. This is the independent
cross-check on the Boris fixed-step phase error, matching the adaptive gyro-orbit
ASCOT5 runs as reference.

cpp_rk_step integrates one macrostep of the Lorentz ODE dx/dt = v, dv/dt = qcm v x B
to relative tolerance st%rtol (= namelist relerr); cp_rk_rhs sources B from the same
chartmap and carries a threadprivate warm guess updated per sub-step (the adaptive
step can move many gyroradii). Wired through orbit_model = 8 in simple/simple_main
(validate, field setup, init via init_cp_boris, macrostep dispatch).

Known limitation: at a coarse macrostep the adaptive sub-steps overshoot the LCFS and
cart_field faults there rather than returning the clamped-edge field, so the macrostep
aborts as a numerical fault; a fine macrostep (npoiper2 like Boris) keeps excursions
small and the fault rate Boris-like, but is slow for edge orbits. Graceful past-edge
field handling and event-based loss detection (odeint ode_event_t) would let it use
coarse adaptive steps; tracked for follow-up.
…s test

The chartmap field matches ASCOT5's B_STS to 0.01% in |B| and 0.01deg in direction,
the integrator is equivalent (both fixed-step, SIMPLE finer), relativity is negligible
(gamma-1=9.4e-4) and energy is machine-exact, so the residual CP-vs-ASCOT over-loss is
in the loss detector, not the orbit (see benchmark run_simple/.../orbit_cmp).

The loss keys on the Larmor-corrected guiding centre u_gc (the particle, gyro-excursed
past s=1, is off-chart and cannot be inverted), but u_gc is a second cold-guess locate
of x_gc and carries the residual field-period-seam noise, occasionally returning rho>=1
for a particle that is at mid-radius. Confirm a flagged loss with the robust warm-started
particle locate u_p: a real loss has the particle within a Larmor radius of the edge
(|x_gc-x| is a Larmor radius), so u_gc>=1 while u_p is well inside (GC_PARTICLE_GAP=0.05,
several Larmor radii) is a reconstruction glitch, not a loss. The guiding centre is still
reported in (s,theta,phi). At 1e-5 s spurious losses 5 -> 3, CP confined 0.990 -> 0.996.
@krystophny

Copy link
Copy Markdown
Member Author

Status update: branch pivoted to Cartesian-chartmap CP/CPP, now matches ASCOT5

This branch has moved well past the original title (VMEC curvilinear 6D integrator).
The CP/CPP 6D full orbit now integrates in Cartesian (x, v) with the chartmap
as field+geometry source (orbit_cpp_boris.f90), per issues #419/#420/#421. Summary
of where it stands:

Validation (reactor W7-X high-mirror, 1000 alphas, 1 ms): SIMPLE CP full orbit
confined fraction 0.896, matching ASCOT5 full orbit (0.898) within the 1000-marker
statistics; SIMPLE GC 0.905, ASCOT5 GC 0.904. CP went 0.794 -> 0.875 -> 0.896 as two
purely-numerical bugs in the chartmap inversion were fixed:

  1. 86b2b2f (+ b3e4b59, 5df5a1d) -- field-period-seam Newton stall faking losses:
    in-wedge geometric toroidal seed + residual-vs-radial-cell loss criterion.
  2. 703404f -- the guiding-centre loss reconstruction locate(x_gc) carried residual
    seam noise; a flagged loss is confirmed against the robust particle locate.

The orbit-level cross-check (benchmark cartesian_cp_test/orbit_cmp/) ruled out every
physics difference: the chartmap field matches ASCOT's B_STS to 0.01% in |B|, 0.01
deg in direction
; the integrators are equivalent (both fixed-step Boris -- ASCOT FO
is simulate_fo_fixed, SIMPLE finer at ~38 vs 20 steps/gyroperiod); relativity is
negligible; energy is machine-exact.

New: 5df5a1d adds ORBIT_CP6D_RK = 8, an adaptive Cash-Karp RK45 full-orbit CP
(libneo odeint_allroutines), the ASCOT5-style independent integrator cross-check.

Long runs: GC / CP-Boris / CP-RK at 100 ms and 1 s are scheduled on acluster (7-day
SLURM walltime). libneo chartmap inversion fixes are in PR itpplasma/libneo#325.

Suggest retitling this PR to reflect the Cartesian-chartmap CP/CPP port, or splitting
the merged work out. Benchmark and full orbit-level writeup: private
itpplasma/benchmark-orbit-proxima#2.

Unify the times_lost.dat convention across integrators. The CP/CPP chartmap paths
produce ierr_orbit==3 field-locate faults (near-axis), which GC never has; the fault
branch wrote times_lost=-1 for these confined survivors while every clean-confined
particle (GC included) gets trace_time. -1 then collided with the never-traced
(skipped passing) marker value, so a script keying 'confined' on times_lost==trace_time
miscounted CP. A fault is not a loss and the particle is confined, so record trace_time
like GC; the fault is tracked by the diag counters. Verified: GC and CP now write an
identical convention (309 skipped t=-1, rest confined t=trace_time, lost 0<t<trace_time).
Adaptive RK45 full-orbit CP scores confined particles as lost: adaptive
substeps overshoot the LCFS and the chartmap field faults, aborting the
macrostep. validate_orbit_model_config now error-stops on ORBIT_CP6D_RK
and directs users to ORBIT_CP6D_BORIS until graceful past-edge field and
odeint loss-event handling land.
## What

Reindent 31 files with `fprettify` to the formatting the 6D-port branch
(#408) already uses, so that PR's diff carries real changes only. This
is
formatting churn lifted out ahead of it: ~7600 of #408's ~18900 changed
lines are reindentation.

`fprettify` 0.3.7 with the repo `.fprettify` config (4-space indent, 88
cols) produces these files from `main`. `get_canonical_coordinates.F90`
is
the branch's whole-file reformat, taken verbatim (zero real change).

## No semantic change

```
$ git diff -w --ignore-blank-lines origin/main
(empty)
```

No token changes anywhere: only indentation and blank-line
normalization.
Build passes (`make CONFIG=Fast`).

## After merge

Rebasing #408 on top drops the reformatted files from its diff, since
the
formatting matches. The seven files `fprettify` does not reproduce
(`orbit_symplectic`, `classification`, `simple`, `params`,
`field_can_test`,
`simple_main`, plus the field-can hand-formatting) stay in #408 with
their
real changes.
Brings in the fprettify formatting split (#425). Resolved every conflict
to the branch side: main's contribution to those files is formatting only,
already present here, so the merge tree is byte-identical to the pre-merge
branch tip. Drops ~3750 lines of reindentation from this branch's diff.
## Risk tier

- [x] T3: physics, output behavior, coordinate convention

## Correctness contract

### Intended behavior change

Adds a new orbit model: `orbit_model = 7` (`ORBIT_FULL_ORBIT`) selects a
gyro-resolved full-orbit (FO) Boris pusher, the ASCOT-style counterpart
to the
guiding-center (GC) model. A charged particle advances by explicit Boris
drift-rotate-drift in Cartesian (x, v); field and geometry come from the
production Boozer field through the chartmap (B = curl A, tangent to the
flux
surface), with the magnetic axis healed by a pseudo-Cartesian (X,Y,phi)
chart.
Output `z(1:5)` is the guiding-centre reduction each step, so
`times_lost` and
`confined_fraction` are written exactly as for GC.

### Behavior that must not change

`orbit_model = 0` (guiding center, the default) is untouched: the
symplectic GC
path, its init, and its macrostep are unchanged. No existing test output
moves.

### Coordinate / unit conventions

Cartesian (x, v) in the scaled chartmap frame; logical chart u=(rho,
theta_B,
phi_B) with rho = sqrt(s). CGS-style normalized velocity with the GC
sqrt(2)
convention, matching the GC seed so both integrators start from the
identical
particle. `orbit_coord = 1` (Boozer) is required and enforced.

### Numerical invariants

Energy is conserved to machine precision (the Boris rotation is exact
for
constant B over a step; no electric field here). The only confinement
loss is
the guiding-centre crossing s >= 1; a field-inversion non-convergence is
a
numerical fault (`fo_fault`), reported but never counted as a loss.

## Tests added

- unit:
- integration: `test_fo_boris` (energy bound < 1e-3, near-axis crossing,
no spurious loss; passing/trapped/near-axis on the reactor-scale Boozer
field)
- system:
- golden record:

## Golden-record impact

- [x] unchanged

GC is the default and is untouched; FO is opt-in via `orbit_model = 7`.

## Failure modes considered

- Field-inversion non-convergence near a field-period seam: classified
as a
numerical fault (`fo_fault`), never a loss, so a mid-radius particle is
not
  spuriously lost.
- Near-axis singularity of the polar chart: healed by the
pseudo-Cartesian
(X,Y,phi) chart; the near-axis test orbit crosses s in [0.04, 0.07] with
  energy bounded.
- Unsupported combinations (wall loss, collisions, orbit classification,
or
orbit_coord /= 1) stop at config validation with a clear message instead
of
  running silently wrong.

## Manual validation

Reactor-scale W7-X benchmark (separate run): FO confined fraction agrees
with
ASCOT5 full orbit (0.894 vs 0.898 at 1 ms, 1000 alphas), and with GC
within
marker statistics.

## Verification

Before: there is no full-orbit model on `main`; `orbit_model` and the FO
modules
do not exist.

```
$ git grep -l 'ORBIT_FULL_ORBIT\|orbit_fo_boris' origin/main -- src test
(no matches)
```

After: `test_fo_boris` passes and the fast suite is green.

```
$ make test TEST=fo_boris
1/1 Test #6: test_fo_boris ....................   Passed    6.01 sec

  passing   s band [0.5000,0.7700]  max|dE/E0|=2.22E-14  ierr_lost=0
  trapped   s band [0.4999,0.5894]  max|dE/E0|=1.11E-14  ierr_lost=0  drift=8.85E-03
  near-axis s band [0.0400,0.0708]  max|dE/E0|=8.88E-15  ierr_lost=0
  ALL FO-BORIS TESTS PASSED

$ make test-fast
100% tests passed, 0 tests failed out of 53
```

## Relationship to #408

This is the working full-orbit Boris slice, extracted clean off `main`.
The
experimental models on #408 (CPP Pauli, the adaptive RK45 ORBIT_CP6D_RK,
the 6D
canonical CP/CPP, device/mock variants) stay on that branch; #408
rebases on top
once this merges.
Brings in the merged full-orbit (FO) Boris pusher (orbit_model=7,
orbit_fo_boris/orbit_fo_field, test_fo_boris). The four files both branches
touch (params, diag_counters, simple, simple_main) kept this branch's
experimental CP/CPP/RK code; the FO modules build alongside under distinct
names and test_fo_boris passes. ORBIT_FULL_ORBIT vs this branch's
ORBIT_CP6D_BORIS both claim model 7, so the field/dispatch dedup (adopt FO as
the canonical model 7, renumber the experimental models) is a follow-up.
@krystophny

Copy link
Copy Markdown
Member Author

Closing as retired. The curvilinear/canonical full-orbit-port investigation is superseded: the working full orbit shipped as FO (orbit_model=7, src/orbit_fo_boris.f90) via #426, and the curvilinear CP/CPP path is retired per #420 (move to Cartesian); the canonical-CPP bounce-resonance issue is tracked in #417. Branch preserved as z_archive/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L review size up to 1000 changed lines tier/T3 physics or output behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant